-
Notifications
You must be signed in to change notification settings - Fork 12.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace sort implementations #124032
base: master
Are you sure you want to change the base?
Replace sort implementations #124032
Conversation
r? thomcc |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment was marked as outdated.
This comment was marked as outdated.
@bors try (looks like the try build didn't get through) |
Replace sort implementations This PR replaces the sort implementations with tailor-made ones that strike a balance of run-time, compile-time and binary-size, yielding run-time and compile-time improvements. Regressing binary-size for `slice::sort` while improving it for `slice::sort_unstable`. All while upholding the existing soft and hard safety guarantees, and even extending the soft guarantees, detecting strict weak ordering violations with a high chance and reporting it to users via a panic. * `slice::sort` -> driftsort [design document](https://github.com/Voultapher/sort-research-rs/blob/main/writeup/driftsort_introduction/text.md) * `slice::sort_unstable` -> ipnsort [design document](https://github.com/Voultapher/sort-research-rs/blob/main/writeup/ipnsort_introduction/text.md) #### Why should we change the sort implementations? In the [2023 Rust survey](https://blog.rust-lang.org/2024/02/19/2023-Rust-Annual-Survey-2023-results.html#challenges), one of the questions was: "In your opinion, how should work on the following aspects of Rust be prioritized?". The second place was "Runtime performance" and the third one "Compile Times". This PR aims to improve both. #### Why is this one big PR and not multiple? * The current documentation gives performance recommendations for `slice::sort` and `slice::sort_unstable`. If for example only one of them were to changed, this advise may be misleading for some Rust versions. By replacing them atomically, the advise remains largely unchanged, and users don't have to change their code. * driftsort and ipnsort share a substantial part of their implementations. * The implementation of `select_nth_unstable` uses internals of `slice::sort_unstable`, which makes it impractical to split changes. --- This PR is a collaboration with `@orlp.`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (a4132fa): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 679.119s -> 683.166s (0.60%) |
The binary-size results are inline with what we expected, given that Regarding compile-times, we looked at those in our analysis here and here. The instruction count metric does not account for gains via parallelizability, right? |
One thing that I find particularly strange is a 4.31% regression in |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (ae5dc93): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary 2.6%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 4.0%, secondary 3.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary 1.5%, secondary 2.5%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 673.177s -> 678.436s (0.78%) |
So it went from before 54 affected cases with a mean regression of 1.8%, to now 47 affected cases with a mean regression of 1.8%. It's a different baseline master, so these numbers are not directly comparable, but the end effect remains mostly the same. Honestly, the thing that suffered the most is readability of the small-sort selection. EDIT: I suspect this compile time benchmark is more sensitive to EDIT2: Running our own compile-time benchmarks for Before:
After: (with fixed CopyMarker)
So a noticeable regression. |
I think this should resolve the last open point, @thomcc right? |
Looks good to me. Thanks for all the work on this. @bors r+ rollup=never |
@bors r- needs confirmation that the Miri test does not get too slow |
The changes made only a limited improvement for the current small miri coverage and in general test coverage of the sort implementations. But they exploded test times from ~13s to ~240s, which is not deemed worth it.
I've reverted the panic_safety test changes, with the reasoning given in the commit message and the change conversation. |
Doing a sanity test, I noticed with the recent changes hot-u64-random-10000 went from ~81us to ~99us, which means something went wrong, when refactoring the |
Due to refactoring the const_trait usage, the CopyMarker impl was accidentally deleted, which had the consequence that the Copy specialization for the small-sort was never picked.
Let's please re-run the timer run, I suspect we will see slightly more pronounced changes to before now. |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit b7deff3 with merge 24b359fb97924de5ddc5080af6a22ee7b5147b06... |
Replace sort implementations This PR replaces the sort implementations with tailor-made ones that strike a balance of run-time, compile-time and binary-size, yielding run-time and compile-time improvements. Regressing binary-size for `slice::sort` while improving it for `slice::sort_unstable`. All while upholding the existing soft and hard safety guarantees, and even extending the soft guarantees, detecting strict weak ordering violations with a high chance and reporting it to users via a panic. * `slice::sort` -> driftsort [design document](https://github.com/Voultapher/sort-research-rs/blob/main/writeup/driftsort_introduction/text.md), includes detailed benchmarks and analysis. * `slice::sort_unstable` -> ipnsort [design document](https://github.com/Voultapher/sort-research-rs/blob/main/writeup/ipnsort_introduction/text.md), includes detailed benchmarks and analysis. #### Why should we change the sort implementations? In the [2023 Rust survey](https://blog.rust-lang.org/2024/02/19/2023-Rust-Annual-Survey-2023-results.html#challenges), one of the questions was: "In your opinion, how should work on the following aspects of Rust be prioritized?". The second place was "Runtime performance" and the third one "Compile Times". This PR aims to improve both. #### Why is this one big PR and not multiple? * The current documentation gives performance recommendations for `slice::sort` and `slice::sort_unstable`. If for example only one of them were to be changed, this advice would be misleading for some Rust versions. By replacing them atomically, the advice remains largely unchanged, and users don't have to change their code. * driftsort and ipnsort share a substantial part of their implementations. * The implementation of `select_nth_unstable` uses internals of `slice::sort_unstable`, which makes it impractical to split changes. --- This PR is a collaboration with `@orlp.`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (24b359f): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary 2.4%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary 3.3%, secondary -0.6%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary 1.6%, secondary 2.4%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 670.379s -> 677.734s (1.10%) |
Ok, so the mean regression of the affected cases went from 1.8% to 1.9%. I'd say the previous reasoning doesn't change. |
I think there should be no more outstanding issues, from my side it's good to merge. |
@RalfJung is it fine from miri's end now? |
This no longer has any miri-related changes, so -- yes. |
This PR replaces the sort implementations with tailor-made ones that strike a balance of run-time, compile-time and binary-size, yielding run-time and compile-time improvements. Regressing binary-size for
slice::sort
while improving it forslice::sort_unstable
. All while upholding the existing soft and hard safety guarantees, and even extending the soft guarantees, detecting strict weak ordering violations with a high chance and reporting it to users via a panic.slice::sort
-> driftsort design document, includes detailed benchmarks and analysis.slice::sort_unstable
-> ipnsort design document, includes detailed benchmarks and analysis.Why should we change the sort implementations?
In the 2023 Rust survey, one of the questions was: "In your opinion, how should work on the following aspects of Rust be prioritized?". The second place was "Runtime performance" and the third one "Compile Times". This PR aims to improve both.
Why is this one big PR and not multiple?
slice::sort
andslice::sort_unstable
. If for example only one of them were to be changed, this advice would be misleading for some Rust versions. By replacing them atomically, the advice remains largely unchanged, and users don't have to change their code.select_nth_unstable
uses internals ofslice::sort_unstable
, which makes it impractical to split changes.This PR is a collaboration with @orlp.